Instructors:
Zongxu Zhang (zongxu.zhang@stu.pku.edu.cn)
Zexian Zeng (zexianzeng@pku.edu.cn)
Affiliation:
Center for Quantitative Biology
Academy for Advanced Interdisciplinary Studies
Peking University
Lab Website:
🔗 http://cqb.pku.edu.cn/zenglab/
This materials is largely based on Pancreatic endocrinogenesis differential geometry, we thank the origional authors for their contribution.
Basic scRNA-seq Mapping Process
Overview of the standard single-cell RNA-sequencing (scRNA-seq) analysis workflow — from raw read alignment and expression matrix generation to downstream preprocessing steps.
R-based Workflow: Seurat
Introduction to the Seurat package for scRNA-seq data preprocessing and analysis, including:
Python-based Workflow: Scanpy
Overview of the Scanpy package as a Python counterpart to Seurat, supporting similar functions for data preprocessing, clustering, and visualization in a scalable, efficient framework.
💡 Note: This course will primarily focus on Python-based workflows for scRNA-seq data analysis.
Create a new enviroment with python
conda create --name dynamo python=3.11
Activate the enviroment
conda activate dynamo
Install the following packages
conda install -c conda-forge dynamo-release : to install latest version of dynamo
or, for effeicient installation, using mamba,
mamba install -c conda-forge dynamo-release
mamba install scanpy : to install scanpy for data io
pip install matplotlib==3.7.3 scipy==1.13.1 : to install dependicies of dynamo
pip install gseapy : to install GO enrichment package
If you are using jupyterlab, nb_conda_kernels will automatically register the kernel for you
mamba install ipykernel
# Check python version
!python --version
Python 3.11.10
# Check data exists
# Data collected from https://journals.biologists.com/dev/article/146/12/dev173849/19483/Comprehensive-single-cell-mRNA-profiling-reveals-a
!ls -ll -h /home/zongxu/Misc/lesson/scRNA-application/data/endocrinogenesis_day15.h5ad
-rw-rw-r-- 1 zongxu zongxu 51M Oct 30 2024 /home/zongxu/Misc/lesson/scRNA-application/data/endocrinogenesis_day15.h5ad
# ! source dynamo/bin/activate # in cluster
# Import core packages
import dynamo as dyn
import scanpy as sc
# Disable annoying warnings
import warnings
from numba import NumbaWarning
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=NumbaWarning)
warnings.filterwarnings("ignore", category=UserWarning)
import os
os.chdir('/home/zongxu/Misc/lesson/scRNA-application/')
# Adjust visualization
# from IPython.core.display import display, HTML
# display(HTML("<style>.container { width:90% !important; }</style>"))
# Direct show figure in jupyter
%matplotlib inline
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm /home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/numba/np/ufunc/dufunc.py:343: NumbaWarning: Compilation requested for previously compiled argument types ((uint32,)). This has no effect and perhaps indicates a bug in the calling code (compiling a ufunc more than once for the same signature warnings.warn(msg, errors.NumbaWarning) /home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/numba/np/ufunc/dufunc.py:343: NumbaWarning: Compilation requested for previously compiled argument types ((uint32,)). This has no effect and perhaps indicates a bug in the calling code (compiling a ufunc more than once for the same signature warnings.warn(msg, errors.NumbaWarning) /home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/numba/np/ufunc/dufunc.py:343: NumbaWarning: Compilation requested for previously compiled argument types ((uint32,)). This has no effect and perhaps indicates a bug in the calling code (compiling a ufunc more than once for the same signature warnings.warn(msg, errors.NumbaWarning)
# check the working dir
os.getcwd()
'/home/zongxu/Misc/lesson/scRNA-application'
# Check dynamo and all dependencies version
dyn.get_all_dependencies_version()
| package | umap-learn | typing-extensions | tqdm | statsmodels | setuptools | session-info | seaborn | scipy | scikit-learn | pynndescent | pre-commit | pandas | openpyxl | numpy | numdifftools | numba | networkx | matplotlib | loompy | get-version | dynamo-release | colorcet | anndata |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| version | 0.5.6 | 4.12.2 | 4.66.6 | 0.14.4 | 75.1.0 | 1.0.0 | 0.13.2 | 1.13.1 | 1.5.2 | 0.5.13 | 4.0.1 | 2.2.3 | 3.1.5 | 1.23.5 | 0.9.41 | 0.60.0 | 3.4 | 3.7.3 | 3.0.6 | 3.5.5 | 1.3.2 | 3.1.0 | 0.10.9 |
# Set resolution/size, styling and format of figures
dyn.configuration.set_figure_params("dynamo", background="white", figsize=(3, 2))
# usually using sc.read_10x_mtx or sc.read_10x_h5
data_path = './data/endocrinogenesis_day15.h5ad'
adata = sc.read_h5ad(data_path)
adata
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score'
var: 'highly_variable_genes'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
layers: 'spliced', 'unspliced'
obsp: 'distances', 'connectivities'
# X for sorting counts matrix, typically in Compressed Sparse Row format
adata.X
<3696x27998 sparse matrix of type '<class 'numpy.float32'>' with 9298890 stored elements in Compressed Sparse Row format>
# see what exactly does X have
print(adata.X)
(0, 18) 2.0 (0, 25) 2.0 (0, 32) 14.0 (0, 35) 2.0 (0, 54) 1.0 (0, 76) 1.0 (0, 97) 1.0 (0, 104) 1.0 (0, 109) 3.0 (0, 135) 1.0 (0, 154) 1.0 (0, 156) 1.0 (0, 158) 1.0 (0, 171) 1.0 (0, 198) 2.0 (0, 216) 1.0 (0, 247) 1.0 (0, 257) 3.0 (0, 258) 1.0 (0, 264) 13.0 (0, 265) 2.0 (0, 274) 1.0 (0, 277) 1.0 (0, 278) 1.0 (0, 325) 4.0 : : (3695, 27067) 1.0 (3695, 27071) 1.0 (3695, 27072) 1.0 (3695, 27074) 1.0 (3695, 27084) 1.0 (3695, 27136) 1.0 (3695, 27140) 1.0 (3695, 27149) 3.0 (3695, 27153) 21.0 (3695, 27154) 11.0 (3695, 27160) 1.0 (3695, 27169) 1.0 (3695, 27184) 4.0 (3695, 27206) 1.0 (3695, 27209) 2.0 (3695, 27211) 1.0 (3695, 27218) 1.0 (3695, 27220) 1.0 (3695, 27221) 1.0 (3695, 27222) 3.0 (3695, 27230) 1.0 (3695, 27231) 2.0 (3695, 27243) 2.0 (3695, 27246) 8.0 (3695, 27256) 2.0
# convert it to a numpy array
adata.X.toarray()
array([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], dtype=float32)
adata.X.toarray().shape
(3696, 27998)
# different layer sorted for different gene counts,
# which is used for calculate RNA velocity
# to call this for you data, please refer to velocyto
adata.layers['spliced'], adata.layers['unspliced']
(<3696x27998 sparse matrix of type '<class 'numpy.float32'>' with 9298890 stored elements in Compressed Sparse Row format>, <3696x27998 sparse matrix of type '<class 'numpy.float32'>' with 3156504 stored elements in Compressed Sparse Row format>)
# adata.obs sorted cell meta data, include cluser and some specific score
adata.obs.head(3)
| clusters_coarse | clusters | S_score | G2M_score | |
|---|---|---|---|---|
| index | ||||
| AAACCTGAGAGGGATA | Pre-endocrine | Pre-endocrine | -0.224902 | -0.252071 |
| AAACCTGAGCCTTGAT | Ductal | Ductal | -0.014707 | -0.232610 |
| AAACCTGAGGCAATTA | Endocrine | Alpha | -0.171255 | -0.286834 |
# adata.var sorted gene meta data
adata.var.head(3)
| highly_variable_genes | |
|---|---|
| index | |
| Xkr4 | False |
| Gm37381 | NaN |
| Rp1 | NaN |
# adata.obsm sorted some coordinated of cell
adata.obsm
AxisArrays with keys: X_pca, X_umap
# known differention related genes in pancreas
pancreas_genes = [
"Hes1", "Nkx6-1", "Nkx2-2",
"Neurog3", "Neurod1", "Pax4",
"Pax6", "Arx", "Pdx1",
"Ins1", "Ins2", "Ghrl",
"Ptf1a", "Iapp", "Isl1",
"Sox9", "Gcg",
]
figure_path = './figure/'
# do preprocessing, analog to scanpy
# select 4000 top variavle genes and filter genes that are detected in less than 20 counts (shared)
# ensure all pancrease genes are in the list
dyn.pp.recipe_monocle(adata, n_top_genes=4000,
fg_kwargs={"shared_count": 20},
genes_to_append=pancreas_genes)
|-----? dynamo.preprocessing.deprecated is deprecated. |-----> recipe_monocle_keep_filtered_cells_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_cells_key=True |-----> recipe_monocle_keep_filtered_genes_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_filtered_genes_key=True |-----> recipe_monocle_keep_raw_layers_key is None. Using default value from DynamoAdataConfig: recipe_monocle_keep_raw_layers_key=True |-----> apply Monocole recipe to adata...
/tmp/ipykernel_43792/4080765733.py:4: DeprecationWarning: recipe_monocle is deprecated and will be removed in a future release. Please update your code to use the new replacement function. dyn.pp.recipe_monocle(adata, n_top_genes=4000,
|-----> ensure all cell and variable names unique. |-----> ensure all data in different layers in csr sparse matrix format. |-----> ensure all labeling data properly collapased |-----> filtering cells... |-----> 3696 cells passed basic filters. |-----> filtering gene... |-----> 6956 genes passed basic filters. |-----> calculating size factor... |-----> selecting genes in layer: X, sort method: SVR... |-----> size factor normalizing the data, followed by log1p transformation. |-----> Set <adata.X> to normalized data |-----> applying PCA ... |-----> <insert> X_pca to obsm in AnnData Object. |-----> cell cycle scoring... |-----> computing cell phase... |-----> [Cell Phase Estimation] completed [31.4415s] |-----> [Cell Cycle Scores Estimation] completed [0.4372s] |-----> [recipe_monocle preprocess] completed [7.9795s]
# Setting the model into stochastic mode, for cells may not at steady state
dyn.tl.dynamics(adata, model="stochastic")
|-----> dynamics_del_2nd_moments_key is None. Using default value from DynamoAdataConfig: dynamics_del_2nd_moments_key=False |-----------> removing existing M layers:[]... |-----------> making adata smooth... |-----> calculating first/second moments... |-----> [moments calculation] completed [21.4898s]
estimating gamma: 100%|██████████| 4000/4000 [04:08<00:00, 16.10it/s]
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase'
var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics'
obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores'
layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S'
obsp: 'distances', 'connectivities', 'moments_con'
# Dimension reduction, and then calculate UMAP basis
dyn.tl.reduceDimension(adata, n_pca_components=30)
|-----> retrieve data for non-linear dimension reduction... |-----? adata already have basis umap. dimension reduction umap will be skipped! set enforce=True to re-performing dimension reduction. |-----> [UMAP] completed [0.0040s]
# Project high dimensional velocity vectors onto
# UMAP low dimensional embeddings
dyn.tl.cell_velocities(adata, method="pearson", other_kernels_dict={"transform": "sqrt"})
# PCA low dimensional embeddings
dyn.tl.cell_velocities(adata, basis="pca")
|-----> [calculating transition matrix via pearson kernel with sqrt transform.] in progress: 100.0000%|-----> [calculating transition matrix via pearson kernel with sqrt transform.] completed [2.6657s] |-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%|-----> [projecting velocity vector to low dimensional embedding] completed [0.7203s] Using existing pearson_transition_matrix found in .obsp. |-----> [projecting velocity vector to low dimensional embedding] in progress: 100.0000%|-----> [projecting velocity vector to low dimensional embedding] completed [0.7704s]
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase'
var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca'
obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca'
layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S'
obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix'
# Plot the velocity streamline
dyn.pl.streamline_plot(adata,
color=["clusters"],
basis="umap",
show_legend="on data",
show_arrowed_spines=True,
save_show_or_return='save',
save_kwargs={
"prefix": "01_velocity_streamline", # prefix
"ext": "png", # file format
"path": figure_path, # save dir (default current working dir)
"dpi": 300, # resolution
"transparent": True,
"close": True,
"verbose": True
}
)
|-----------> plotting with basis key=X_umap |-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type Saving figure to ./figure/01_velocity_streamline_dyn_savefig.png... Done
# Plot the velocity vector per cell
dyn.pl.cell_wise_vectors(
adata,
color=["clusters"],
basis="umap",
show_legend="on data",
quiver_length=4,
quiver_size=4,
figsize=(5, 3),
show_arrowed_spines=False,
save_show_or_return='save',
save_kwargs={
"prefix": "02_cell_wise_velocity", # prefix
"ext": "png", # file format
"path": figure_path, # save dir (default current working dir)
"dpi": 300, # resolution
"transparent": True,
"close": True,
"verbose": True
}
)
|-----> X shape: (3696, 2) V shape: (3696, 2) |-----------> plotting with basis key=X_umap |-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type Saving figure to ./figure/02_cell_wise_velocity_dyn_savefig.png... Done
# Setting the progenitor cell types
progenitor = adata.obs_names[adata.obs.clusters.isin(["Ductal"])]
len(progenitor)
916
Then, lets dive into some basic differential geometry
# Estimate the vector field
dyn.vf.VectorField(adata, basis="pca", pot_curl_div=True)
dyn.vf.VectorField(adata, basis="umap", pot_curl_div=True)
# Calculate the speed, divergence, accelaration and curl
dyn.vf.speed(adata, basis="pca")
dyn.vf.divergence(adata, basis="pca")
dyn.vf.acceleration(adata, basis="pca")
dyn.vf.curl(adata, basis="umap")
|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: PCA.
Vector field will be learned in the PCA space.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] completed [1.0299s]
|-----> Running ddhodge to estimate vector field based pseudotime in pca basis...
|-----> graphizing vectorfield...
|-----> incomplete neighbor graph info detected: connectivities and distances do not exist in adata.obsp, indices not in adata.uns.neighbors.
|-----> Neighbor graph is broken, recomputing....
|-----> Start computing neighbor graph...
|-----------> X_data is None, fetching or recomputing...
|-----> fetching X data from layer:None, basis:pca
|-----> method arg is None, choosing methods automatically...
|-----------> method ball_tree selected
|-----------? nbrs_idx argument is ignored and recomputed because nbrs_idx is not None and return_nbrs=True
|-----------> calculating neighbor indices...
|-----> [ddhodge completed] completed [12.1489s]
|-----> Computing divergence...
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 14.69it/s]
|-----> [VectorField] completed [13.8660s]
|-----> VectorField reconstruction begins...
|-----> Retrieve X and V based on basis: UMAP.
Vector field will be learned in the UMAP space.
|-----> Generating high dimensional grids and convert into a row matrix.
|-----> Learning vector field with method: sparsevfc.
|-----> [SparseVFC] begins...
|-----> Sampling control points based on data velocity magnitude...
|-----> [SparseVFC] completed [3.8171s]
|-----> Running ddhodge to estimate vector field based pseudotime in umap basis...
|-----> graphizing vectorfield...
|-----------? nbrs_idx argument is ignored and recomputed because nbrs_idx is not None and return_nbrs=True
|-----------> calculating neighbor indices...
|-----> [ddhodge completed] completed [10.7601s]
|-----> Computing curl...
Calculating 2-D curl: 100%|██████████| 3696/3696 [00:00<00:00, 23880.32it/s]
|-----> Computing divergence...
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 25.93it/s]
|-----> [VectorField] completed [15.2468s]
Calculating divergence: 100%|██████████| 4/4 [00:00<00:00, 12.84it/s]
|-----> [Calculating acceleration] in progress: 100.0000%|-----> [Calculating acceleration] completed [0.1335s]
Calculating 2-D curl: 100%|██████████| 3696/3696 [00:00<00:00, 23821.17it/s]
# Draw figures
dyn.configuration.set_figure_params("dynamo", background="black")
import matplotlib.pyplot as plt
fig1, fig1_axes = plt.subplots(ncols=2, nrows=2, constrained_layout=True, figsize=(10, 6))
dyn.pl.cell_wise_vectors(
adata,
color="pca_ddhodge_potential",
pointsize=0.1,
alpha=0.7,
ax=fig1_axes[0, 0],
quiver_length=6,
quiver_size=6,
save_show_or_return="return",
background="black",
)
dyn.pl.grid_vectors(
adata,
color="speed_pca",
basis="umap",
ax=fig1_axes[0, 1],
quiver_length=12,
quiver_size=12,
save_show_or_return="return",
background="black",
)
dyn.pl.streamline_plot(
adata, color="divergence_pca", basis="umap", ax=fig1_axes[1, 0], save_show_or_return="return", background="black"
)
dyn.pl.streamline_plot(
adata, color="acceleration_pca", basis="umap", ax=fig1_axes[1, 1], save_show_or_return="return", background="black"
)
fig1.savefig('./figure/03_combined_plots.png')
plt.show()
|-----> X shape: (3696, 2) V shape: (3696, 2) |-----------> plotting with basis key=X_umap |-----------> plotting with basis key=X_umap |-----------> plotting with basis key=X_umap |-----------> plotting with basis key=X_umap
The acceleration and divergence accurately highlight hotspots, including a saddle point in ductal cells (negative divergence), exit from this state to early endocrine progenitors (high positive acceleration), the bifurcation point for late progenitors to differentiate into stable cell types (high acceleration and positive divergence), and stable cell types (negative divergence).
# Draw the potential cell states
dyn.pl.topography(
adata,
basis="umap",
background="white",
color=["clusters"],
streamline_color="black",
show_legend="on data",
save_show_or_return='save',
save_kwargs={
"prefix": "04_cell_state", # prefix
"ext": "png", # file format
"path": figure_path, # save dir (default current working dir)
"dpi": 300, # resolution
"transparent": True,
"close": True,
"verbose": True
}
)
|-----> Vector field for umap is but its topography is not mapped. Mapping topography now ... |-----------> plotting with basis key=X_umap |-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type Saving figure to ./figure/04_cell_state_dyn_savefig.png... Done
# Draw the state transition map
dyn.configuration.set_figure_params("dynamo", background="white")
dyn.pd.state_graph(adata, group='clusters', basis='pca', method='vf')
dyn.pl.state_graph(adata,
color=['clusters'],
group='clusters',
basis='umap',
show_legend='on data',
method='vf',
save_show_or_return='save',
save_kwargs={
"prefix": "05_cell_state_transition_map", # prefix
"ext": "png", # file format
"path": figure_path, # save dir (default current working dir)
"dpi": 300, # resolution
"transparent": True,
"close": True,
"verbose": True
}
)
|-----> Estimating the transition probability between cell types... |-----> Applying vector field |-----> [KDTree parameter preparation computation] in progress: 0.0000%|-----> [KDTree computation] completed [0.0013s] |-----> [iterate groups] in progress: 12.5000%
integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 22.21it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 128.36it/s]
|-----> [iterate groups] in progress: 25.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 100/100 [00:05<00:00, 18.98it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 128.61it/s]
|-----> [iterate groups] in progress: 37.5000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 100/100 [00:05<00:00, 18.94it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 124.88it/s]
|-----> [iterate groups] in progress: 50.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 22.68it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 134.18it/s]
|-----> [iterate groups] in progress: 62.5000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 70/70 [00:03<00:00, 19.83it/s] uniformly sampling points along a trajectory: 100%|██████████| 70/70 [00:00<00:00, 124.52it/s]
|-----> [iterate groups] in progress: 75.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 24.09it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 130.70it/s]
|-----> [iterate groups] in progress: 87.5000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 100/100 [00:04<00:00, 21.34it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 131.20it/s]
|-----> [iterate groups] in progress: 100.0000%
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :] integration with ivp solver: 100%|██████████| 100/100 [00:05<00:00, 18.43it/s] uniformly sampling points along a trajectory: 100%|██████████| 100/100 [00:00<00:00, 122.76it/s]
|-----> [iterate groups] completed [55.1253s] |-----> [State graph estimation] completed [0.0020s] |-----------> plotting with basis key=X_umap |-----------> skip filtering clusters by stack threshold when stacking color because it is not a numeric type
/home/zongxu/miniconda3/envs/dynamo/lib/python3.11/site-packages/dynamo/prediction/state_graph.py:306: RuntimeWarning: invalid value encountered in divide grp_avg_time[i, :] /= grp_graph[i, :]
Saving figure to ./figure/05_cell_state_transition_map_dyn_savefig.png... Done
# Calculate the jacobian on our predefined genes
dyn.vf.jacobian(adata, regulators=pancreas_genes)
Transforming subset Jacobian: 100%|██████████| 3696/3696 [00:00<00:00, 44017.06it/s]
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase', 'pca_ddhodge_div', 'pca_ddhodge_potential', 'divergence_pca', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'speed_pca', 'acceleration_pca', 'jacobian_det_pca'
var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'VecFld_umap', 'clusters_graph', 'jacobian_pca'
obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_pca'
layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration'
obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix', 'pca_ddhodge', 'umap_ddhodge'
# Ngn3 activates Pax4 in progenitors, initiating pancreatic endocrinogenesis
dyn.pl.jacobian(
adata,
basis="umap",
regulators=[
"Neurog3",
],
effectors=["Pax4"],
alpha=1,
save_show_or_return='save',
save_kwargs={
"prefix": "06_reg_Ngn3_activate_Pax4", # prefix
"ext": "png", # file format
"path": figure_path, # save dir (default current working dir)
"dpi": 300, # resolution
"transparent": True,
"close": True,
"verbose": True
}
)
Saving figure to ./figure/06_reg_Ngn3_activate_Pax4_dyn_savefig.png... Done
# Jacobian analyses reveal mutual inhibition of Pax4 and Arx at the bifurcation point in progenitors.
dyn.pl.jacobian(adata, basis="umap", regulators=["Arx"], effectors=["Pax4"], alpha=1)
dyn.pl.jacobian(adata, basis="umap", regulators=["Pax4"], effectors=["Arx"], alpha=1)
<Figure size 600x400 with 0 Axes>
# Pdx1 activates Ins2 in beta cells.
dyn.pl.jacobian(adata, basis="umap", regulators=["Pax4"], effectors=["Ins2"], alpha=1)
adata.obs["clusters"].unique()
['Pre-endocrine', 'Ductal', 'Alpha', 'Ngn3 high EP', 'Delta', 'Beta', 'Ngn3 low EP', 'Epsilon'] Categories (8, object): ['Ductal', 'Ngn3 low EP', 'Ngn3 high EP', 'Pre-endocrine', 'Beta', 'Alpha', 'Delta', 'Epsilon']
# Rank the elements in the Jacobian
dyn.vf.acceleration(adata, basis='pca')
dyn.vf.rank_acceleration_genes(adata, groups='clusters')
|-----> [Calculating acceleration] in progress: 100.0000%|-----> [Calculating acceleration] completed [0.0734s]
AnnData object with n_obs × n_vars = 3696 × 27998
obs: 'clusters_coarse', 'clusters', 'S_score', 'G2M_score', 'nGenes', 'nCounts', 'pMito', 'pass_basic_filter', 'Size_Factor', 'initial_cell_size', 'unspliced_Size_Factor', 'initial_unspliced_cell_size', 'spliced_Size_Factor', 'initial_spliced_cell_size', 'ntr', 'cell_cycle_phase', 'pca_ddhodge_div', 'pca_ddhodge_potential', 'divergence_pca', 'control_point_pca', 'inlier_prob_pca', 'obs_vf_angle_pca', 'umap_ddhodge_div', 'umap_ddhodge_potential', 'curl_umap', 'divergence_umap', 'control_point_umap', 'inlier_prob_umap', 'obs_vf_angle_umap', 'speed_pca', 'acceleration_pca', 'jacobian_det_pca'
var: 'highly_variable_genes', 'nCells', 'nCounts', 'pass_basic_filter', 'log_cv', 'log_m', 'score', 'frac', 'use_for_pca', 'ntr', 'beta', 'gamma', 'half_life', 'alpha_b', 'alpha_r2', 'gamma_b', 'gamma_r2', 'gamma_logLL', 'delta_b', 'delta_r2', 'bs', 'bf', 'uu0', 'ul0', 'su0', 'sl0', 'U0', 'S0', 'total0', 'use_for_dynamics', 'use_for_transition'
uns: 'clusters_coarse_colors', 'clusters_colors', 'day_colors', 'neighbors', 'pca', 'pp', 'velocyto_SVR', 'PCs', 'explained_variance_ratio_', 'pca_mean', 'pca_fit', 'feature_selection', 'cell_phase_genes', 'dynamics', 'grid_velocity_umap', 'grid_velocity_pca', 'VecFld_pca', 'VecFld_umap', 'clusters_graph', 'jacobian_pca', 'rank_acceleration', 'rank_abs_acceleration'
obsm: 'X_pca', 'X_umap', 'X', 'cell_cycle_scores', 'velocity_umap', 'velocity_pca', 'velocity_pca_SparseVFC', 'X_pca_SparseVFC', 'velocity_umap_SparseVFC', 'X_umap_SparseVFC', 'acceleration_pca'
layers: 'spliced', 'unspliced', 'X_unspliced', 'X_spliced', 'M_u', 'M_uu', 'M_s', 'M_us', 'M_ss', 'velocity_S', 'acceleration'
obsp: 'distances', 'connectivities', 'moments_con', 'pearson_transition_matrix', 'pca_ddhodge', 'umap_ddhodge'
adata.uns['rank_acceleration'][:5]
| Alpha | Beta | Delta | Ductal | Epsilon | Ngn3 high EP | Ngn3 low EP | Pre-endocrine | |
|---|---|---|---|---|---|---|---|---|
| 0 | Ghrl | Ins1 | Cck | Hmgb2 | Tmsb4x | Chga | Neurog3 | Tmsb4x |
| 1 | Cck | Ins2 | Ins1 | Ccnb2 | Gcg | Chgb | Tmsb4x | Pyy |
| 2 | Mdk | Xist | Cdkn1a | Cdc20 | Nkx6-1 | Tm4sf4 | Btbd17 | Iapp |
| 3 | Rbp4 | Krt8 | Mdk | Hist1h2bc | Rps19 | Akr1c19 | Mdk | Calr |
| 4 | Serpina1c | Gpx3 | Krt7 | Dynll1 | Tmsb10 | Cryba2 | Btg2 | Ppp1r14a |
# Enrichment analysis allows us to see if top ranking genes are significantly enriched in
# certain biological pathways. For example, when we pick the top 100 accelerating genes
# in Ductal cells, we found that they are highly enriched in cell cycle related pathways.
!pwd
/home/zongxu/Misc/lesson/scRNA-application
enr = dyn.ext.enrichr(
adata.uns['rank_acceleration']['Ductal'][:100].to_list(),
organism='mouse',
gene_sets='GO_Biological_Process_2021',
outdir='./figure'
)
enr.results.head(3)
| Gene_set | Term | Overlap | P-value | Adjusted P-value | Old P-value | Old Adjusted P-value | Odds Ratio | Combined Score | Genes | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | GO_Biological_Process_2021 | mitotic spindle organization (GO:0007052) | 20/157 | 7.135193e-23 | 6.507296e-20 | 0 | 0 | 36.063869 | 1839.055984 | NDE1;BUB1B;CDCA8;KIF23;DYNLL1;RANGAP1;PMF1;AUR... |
| 1 | GO_Biological_Process_2021 | microtubule cytoskeleton organization involved... | 17/128 | 7.210195e-20 | 3.287849e-17 | 0 | 0 | 36.515033 | 1609.444116 | NDE1;BUB1B;CDCA8;DYNLL1;RANGAP1;PMF1;AURKA;CDC... |
| 2 | GO_Biological_Process_2021 | mitotic sister chromatid segregation (GO:0000070) | 13/102 | 3.535785e-15 | 1.074879e-12 | 0 | 0 | 33.261397 | 1106.800974 | CDCA8;KIF23;KIF22;SMC4;SMC2;CENPE;CCNB1;KIFC1;... |
from gseapy.plot import barplot, dotplot
barplot(enr.res2d, column = 'Adjusted P-value',title='GO_Biological_Process_2018', cutoff=0.05)
dotplot(enr.res2d, title='KEGG_2016',cmap='viridis_r', cutoff=0.05)
plt.show()
import sys
import pkg_resources
print("### Session Info ###\n")
loaded_pkgs = []
for mod_name, module in sys.modules.items():
file_path = getattr(module, "__file__", None)
if (
file_path
and "site-packages" in file_path
and not mod_name.startswith("_")
and "." not in mod_name
):
try:
version = pkg_resources.get_distribution(mod_name).version
loaded_pkgs.append((mod_name, version))
except Exception:
pass
for name, version in sorted(set(loaded_pkgs)):
print(f"{name:<15} v{version}")
### Session Info ### IPython v8.29.0 anndata v0.10.9 asttokens v2.4.1 certifi v2024.8.30 cffi v1.17.1 charset_normalizer v3.4.0 colorama v0.4.6 colorcet v3.1.0 comm v0.2.2 cycler v0.12.1 debugpy v1.8.7 decorator v5.1.1 executing v2.1.0 fontTools v4.54.1 gseapy v1.1.3 h5py v3.12.1 idna v3.10 igraph v0.11.6 ipykernel v6.29.5 jedi v0.19.1 joblib v1.4.2 jupyter_client v8.6.3 jupyter_core v5.7.2 kiwisolver v1.4.7 legacy_api_wrap v1.4 llvmlite v0.43.0 louvain v0.8.0 matplotlib v3.7.3 matplotlib_inline v0.1.7 more_itertools v10.5.0 natsort v8.4.0 networkx v3.4 numba v0.60.0 numdifftools v0.9.41 numpy v1.23.5 packaging v24.1 pandas v2.2.3 parso v0.8.4 patsy v0.5.6 pexpect v4.9.0 pickleshare v0.7.5 platformdirs v4.3.6 prompt_toolkit v3.0.48 psutil v6.1.0 ptyprocess v0.7.0 pure_eval v0.2.3 pycparser v2.22 pygments v2.18.0 pynndescent v0.5.13 pyparsing v3.2.0 pytz v2024.1 requests v2.32.3 scanpy v1.10.3 scipy v1.13.1 seaborn v0.13.2 six v1.16.0 stack_data v0.6.2 statsmodels v0.14.4 texttable v1.7.0 threadpoolctl v3.5.0 tornado v6.4.1 tqdm v4.66.6 traitlets v5.14.3 typing_extensions v4.12.2 urllib3 v2.2.3 wcwidth v0.2.13